There is some confusion about how students are classified as full time and part time in these datasets. According to the Montgomery college website a full student is a student who attempts 12 or more credits. A part time student is defined as a student who attempts less than than 12 credits. I will know reclassify students according to these definitions and observe the differences in the distribution.
In this part of my project I will refine my research questions. I will further examine the effects of the pandemic on recent MCPS highschool graduates enrolled at Montgomery College. For the purposes of this study I will limit my dataset to MCPS students under the age of 20. These MCPS students will be divided further into subgroups based on Gender and Race. The datasets used in this part of my project have already been cleaned in my initial data analysis. Outliers have not been removed. I will conduct my statistical analysis with and without the outliers.
For the purposes of this Project the following variables and definitions are important.
The population in this dataset is the incoming cohort of students in Fall of 2019 and 2020. These students are first time degree or certificate seekers and have no prior tertiary education. They may have earned AP credits in highschool.
Fall2019 refers to the incoming freshman cohort in Fall2019. This is term year 2020.
Fall2020 refers to the incoming freshman cohort in Fall2020. This is term year 2021.
Variables of Interest: term year Incoming students in Fall2019 are assigned to term year 2020. Incoming students in Fall 2020 are assigned to term year 2021.
hours_earned: refers to credit hours the student has earned in their first Fall semester ( this can include credits earned in Summer school second session- Summer 1 and AP credits earned in high school).
hours_attempted: refers to credit and non credit hours the student has attempted in their first Fall semester ( this may include credits attempted in Summerschool second session - Summer 1).
full_part: is the student full-time (FT) or part-time (PT). Part time students are registered in less than 12 credit hours. Full-time students take at least 12 credits. major: degree programme student is registered for or certificate&LR ( letter of recommendation.) All certificates and letters of recommendations have been grouped together.
hours_earned_rate: Ratio of hours_earned/hours_attempted age: Age of student at start of program.
race: Racial classification of student. sex: Gender classification of student. high_school: Name of highschool student graduted from. Public High schools in Montgomery county are classified as MCPS. pell: Whether the student receives a pell grant or not.
Summary of Data and Types
skim(df_Degrees)
| Name | df_Degrees |
| Number of rows | 7123 |
| Number of columns | 24 |
| _______________________ | |
| Column type frequency: | |
| character | 15 |
| logical | 1 |
| numeric | 8 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| sex | 0 | 1.00 | 1 | 1 | 0 | 4 | 0 |
| race | 0 | 1.00 | 5 | 22 | 0 | 9 | 0 |
| age | 0 | 1.00 | 4 | 7 | 0 | 5 | 0 |
| high_school | 0 | 1.00 | 7 | 30 | 0 | 163 | 0 |
| full_part | 0 | 1.00 | 2 | 2 | 0 | 2 | 0 |
| city | 19 | 1.00 | 5 | 19 | 0 | 127 | 0 |
| stat_code | 19 | 1.00 | 2 | 2 | 0 | 16 | 0 |
| pell_grant | 0 | 1.00 | 1 | 1 | 0 | 2 | 0 |
| camp_code | 140 | 0.98 | 1 | 1 | 0 | 6 | 0 |
| major | 0 | 1.00 | 1 | 61 | 0 | 34 | 0 |
| pass_engl | 0 | 1.00 | 1 | 1 | 0 | 2 | 0 |
| pass_math | 0 | 1.00 | 1 | 1 | 0 | 2 | 0 |
| summer1 | 0 | 1.00 | 1 | 1 | 0 | 1 | 0 |
| fall | 0 | 1.00 | 1 | 1 | 0 | 1 | 0 |
| HS_classify | 0 | 1.00 | 2 | 14 | 0 | 7 | 0 |
Variable type: logical
| skim_variable | n_missing | complete_rate | mean | count |
|---|---|---|---|---|
| MCPS | 0 | 1 | 0.7 | TRU: 4963, FAL: 2160 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| u_number | 0 | 1 | 20196625.60 | 5027.06 | 20190001 | 20191872.50 | 20193733.00 | 20201703.5 | 20203588.0 | ▇▃▁▂▇ |
| zip | 19 | 1 | 20886.64 | 1559.40 | 1460 | 20853.00 | 20877.00 | 20903.0 | 94025.0 | ▁▇▁▁▁ |
| hours_attempted | 0 | 1 | 12.46 | 6.23 | 1 | 9.00 | 12.00 | 15.0 | 54.0 | ▆▇▁▁▁ |
| hours_earned | 0 | 1 | 7.85 | 7.43 | 0 | 3.00 | 6.00 | 12.0 | 54.0 | ▇▃▁▁▁ |
| mc_gpa | 0 | 1 | 2.19 | 1.47 | 0 | 0.67 | 2.50 | 3.5 | 4.0 | ▆▂▃▅▇ |
| term_year | 0 | 1 | 2020.47 | 0.50 | 2020 | 2020.00 | 2020.00 | 2021.0 | 2021.0 | ▇▁▁▁▇ |
| hours_earned_rate | 0 | 1 | 0.57 | 0.38 | 0 | 0.23 | 0.64 | 1.0 | 3.2 | ▇▇▁▁▁ |
| unearned_hours | 0 | 1 | 4.61 | 4.24 | -22 | 0.00 | 4.00 | 7.0 | 25.0 | ▁▁▇▂▁ |
Change Datatypes
df_Degrees$u_number<- as.character(df_Degrees$u_number)
df_Degrees$term_year<- as.character(df_Degrees$term_year)
Use the dataframe df_Degrees which has been cleaned in the initial data analysis. Filter all MCPS students who are 20yrs and younger in age.
df_MCPS20D<-df_Degrees %>%
filter(HS_classify=="MCPS")%>% # filter degrees dataset to obtain students who graduated MCPS highschools
filter(age=='18 - 20' | age =="< 18") # filter students who are 20yrs old and younger.
df_MCPS20D<-df_MCPS20D %>%
select(.,-c("full_part"))
df_MCPS20D<-df_MCPS20D %>%
mutate(full_part = ifelse(hours_attempted<12,"PT","FT"))
who graduated from MCPS highschools and are 20yrs and younger.
Frequency of Students Part time versus Full tim: 2020 vs 2021
# Number of students part time abnd full time 2020 vs 2021
ggplot(data=df_MCPS20D, aes(x=full_part, fill=full_part)) +
geom_bar() +
geom_text(stat='count', aes(label=..count..), vjust=2,size=3)+
facet_wrap(~term_year)+
ggtitle("Number of Students Full time versus Part time")+
ylab('Frequency')+
xlab("")+
theme(axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())
# change in overall MCPS student population from 2020 to 2021
df_MCPS20D%>%
group_by(term_year,full_part)%>%
count(full_part)%>%
group_by(term_year)%>%
mutate(total_pop =sum(n))%>%
group_by(full_part)%>%
arrange(term_year,.by_group=TRUE)%>%
mutate(pct_change= (n-lag(n))/lag(n)*100)
## # A tibble: 4 x 5
## # Groups: full_part [2]
## term_year full_part n total_pop pct_change
## <chr> <chr> <int> <int> <dbl>
## 1 2020 FT 1495 2456 NA
## 2 2021 FT 1497 2303 0.134
## 3 2020 PT 961 2456 NA
## 4 2021 PT 806 2303 -16.1
There was a 5.98% decrease in full time students who graduated from MCPS highschools in term year 2021. There was a -6.74% decrease in part time students who graduated from MCPS.
Count of Race Groups
ggplot(data=df_MCPS20D, aes(x=race, fill=race)) +
geom_bar() +
geom_text(stat='count', aes(label=..count..), vjust=0,size=3)+
facet_wrap(~term_year + full_part)+
theme(axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())+
ggtitle("Number of Students per a Race Group")+
xlab("Race")+
ylab("Frequency")
Full time student: Change in enrollment from 2020 to 2021 based on Race
# calculate percentage change in full time student enrollment from 2020 to 2021 by race
df_MCPS20D%>%
filter(full_part=="FT")%>%
group_by(term_year,race)%>%
count(race)%>%
group_by(race)%>%
arrange(term_year,.by_group=TRUE)%>%
mutate(pct_change= (n-lag(n))/lag(n)*100)
## # A tibble: 18 x 4
## # Groups: race [9]
## term_year race n pct_change
## <chr> <chr> <int> <dbl>
## 1 2020 Am. Indian / AK Native 4 NA
## 2 2021 Am. Indian / AK Native 1 -75
## 3 2020 Asian 252 NA
## 4 2021 Asian 217 -13.9
## 5 2020 Black / African Am. 341 NA
## 6 2021 Black / African Am. 307 -9.97
## 7 2020 Foreign 98 NA
## 8 2021 Foreign 98 0
## 9 2020 Hawaiian / Pac. Isl. 4 NA
## 10 2021 Hawaiian / Pac. Isl. 3 -25
## 11 2020 Hispanic 482 NA
## 12 2021 Hispanic 569 18.0
## 13 2020 Multi-Race 66 NA
## 14 2021 Multi-Race 60 -9.09
## 15 2020 Unknown 10 NA
## 16 2021 Unknown 3 -70
## 17 2020 White 238 NA
## 18 2021 White 239 0.420
Full time students: There was a 16.5% decline in asian students, 16.1% decline in African American students, a 9.1% decline in white students and 6.8% decline in foreign students. Hispanic students increased by 11.6%.
Part time student: Change in enrollment from 2020 to 2021 based on Race
# calculate percentage change in full time student enrollment from 2020 to 2021 by race
df_MCPS20D%>%
filter(full_part=="PT")%>%
group_by(term_year,race)%>%
count(race)%>%
group_by(race)%>%
arrange(term_year,.by_group=TRUE)%>%
mutate(pct_change= (n-lag(n))/lag(n)*100)
## # A tibble: 18 x 4
## # Groups: race [9]
## term_year race n pct_change
## <chr> <chr> <int> <dbl>
## 1 2020 Am. Indian / AK Native 5 NA
## 2 2021 Am. Indian / AK Native 1 -80
## 3 2020 Asian 89 NA
## 4 2021 Asian 73 -18.0
## 5 2020 Black / African Am. 225 NA
## 6 2021 Black / African Am. 200 -11.1
## 7 2020 Foreign 78 NA
## 8 2021 Foreign 52 -33.3
## 9 2020 Hawaiian / Pac. Isl. 2 NA
## 10 2021 Hawaiian / Pac. Isl. 1 -50
## 11 2020 Hispanic 379 NA
## 12 2021 Hispanic 290 -23.5
## 13 2020 Multi-Race 38 NA
## 14 2021 Multi-Race 38 0
## 15 2020 Unknown 6 NA
## 16 2021 Unknown 2 -66.7
## 17 2020 White 139 NA
## 18 2021 White 149 7.19
Part time students: There was an 8.7% decrease in Asian students, a 26% decrease in foreign students, 2.3% increase in african american students and a 19.6% decrease in hispanic students. There was a 31.25% increase in white students.
Gender of Students
# Gender of students part time and full time 2020 vs 2021
ggplot(data=df_MCPS20D, aes(x=sex, fill=sex)) +
geom_bar() +
geom_text(stat='count', aes(label=..count..), vjust=1,size=3)+
facet_wrap(~term_year+full_part)+
ggtitle("Gender of Students: Full time versus Part time")+
ylab('Frequency')+
xlab("")+
theme(axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())
Calculate percentage change in full time student enrollment from 2020 to 2021 by gender
# calculate percentage change in full time student enrollment from 2020 to 2021 by gender
df_MCPS20D%>%
filter(full_part=="FT")%>%
filter(sex=="F"|sex =="M")%>%
group_by(term_year,sex)%>%
count(sex)%>%
group_by(sex)%>%
arrange(term_year,.by_group=TRUE)%>%
mutate(pct_change= (n-lag(n))/lag(n)*100)
## # A tibble: 4 x 4
## # Groups: sex [2]
## term_year sex n pct_change
## <chr> <chr> <int> <dbl>
## 1 2020 F 732 NA
## 2 2021 F 794 8.47
## 3 2020 M 745 NA
## 4 2021 M 687 -7.79
Full time students: 14% decrease in attendance by male students. A 3.27% decrease in female students.
Calculate percentage change in part time student enrollment from 2020 to 2021 by gender
# calculate percentage change in part time student enrollment from 2020 to 2021 by gender
df_MCPS20D%>%
filter(full_part=="PT")%>%
filter(sex=="F"|sex =="M")%>%
group_by(term_year,sex)%>%
count(sex)%>%
group_by(sex)%>%
arrange(term_year,.by_group=TRUE)%>%
mutate(pct_change= (n-lag(n))/lag(n)*100)
## # A tibble: 4 x 4
## # Groups: sex [2]
## term_year sex n pct_change
## <chr> <chr> <int> <dbl>
## 1 2020 F 442 NA
## 2 2021 F 370 -16.3
## 3 2020 M 498 NA
## 4 2021 M 427 -14.3
Part time: 9.5% decrease in female students. 1.5% decrease in male students.
Gender and Race breakdown of full time students
# Gender and Race of full time students 2020 vs 2021
df_MCPS20D%>%
filter(sex %in% c("F","M"))%>%
filter(full_part=="FT")%>%
ggplot(., aes(x=race, fill=race)) +
geom_bar() +
geom_text(stat='count', aes(label=..count..), vjust=0, size=3)+
facet_wrap(~term_year+sex)+
ggtitle("Gender and Race of Full time Students")+
ylab('Frequency')+
xlab("")+
theme(axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())
# theme(axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())
Full time Student Enrollment Percentages trend by Gender and race
# calculate percentage change in student enrollment from 2020 to 2021 by race and gender
# create data frames with counts of full time students by race and gender
df_MCPS20D%>%
filter(full_part=="FT")%>%
filter(sex=="F"|sex =="M")%>%
group_by(term_year,race,sex)%>%
count(sex)%>%
group_by(race,sex)%>%
arrange(term_year,.by_group=TRUE)%>%
mutate(pct_change= (n-lag(n))/lag(n)*100)
## # A tibble: 35 x 5
## # Groups: race, sex [18]
## term_year race sex n pct_change
## <chr> <chr> <chr> <int> <dbl>
## 1 2020 Am. Indian / AK Native F 3 NA
## 2 2020 Am. Indian / AK Native M 1 NA
## 3 2021 Am. Indian / AK Native M 1 0
## 4 2020 Asian F 106 NA
## 5 2021 Asian F 110 3.77
## 6 2020 Asian M 144 NA
## 7 2021 Asian M 105 -27.1
## 8 2020 Black / African Am. F 159 NA
## 9 2021 Black / African Am. F 160 0.629
## 10 2020 Black / African Am. M 174 NA
## # … with 25 more rows
Part time Student Enrollment Percentages trend by Gender and race
# calculate percentage change in student enrollment from 2020 to 2021 by race and gender
# create data frames with counts of full time students by race and gender
df_MCPS20D%>%
filter(full_part=="PT")%>%
filter(sex=="F"|sex =="M")%>%
group_by(term_year,race,sex)%>%
count(sex)%>%
group_by(race,sex)%>%
arrange(term_year,.by_group=TRUE)%>%
mutate(pct_change= (n-lag(n))/lag(n)*100)
## # A tibble: 33 x 5
## # Groups: race, sex [18]
## term_year race sex n pct_change
## <chr> <chr> <chr> <int> <dbl>
## 1 2020 Am. Indian / AK Native F 1 NA
## 2 2020 Am. Indian / AK Native M 4 NA
## 3 2021 Am. Indian / AK Native M 1 -75
## 4 2020 Asian F 35 NA
## 5 2021 Asian F 24 -31.4
## 6 2020 Asian M 52 NA
## 7 2021 Asian M 49 -5.77
## 8 2020 Black / African Am. F 98 NA
## 9 2021 Black / African Am. F 93 -5.10
## 10 2020 Black / African Am. M 124 NA
## # … with 23 more rows
Overall Majors trend
Count of Majors in Full time students in 2020
z1<- df_MCPS20D%>%
filter(full_part=="FT" &term_year =="2020")%>%
ggplot(., aes(x=major, fill=major)) +
geom_bar() +
geom_text(stat='count', aes(label=..count..), vjust=0, hjust=0, size =3)+
ggtitle("Majors of Full-time Students in 2020 ")+
xlab("Major")+
ylab("Frequency")+
theme(legend.position = "none")
z1 + coord_flip()
Count of Majors in Full time students in 2021
z13<- df_MCPS20D%>%
filter(full_part=="FT" &term_year =="2021")%>%
ggplot(., aes(x=major, fill=major)) +
geom_bar() +
geom_text(stat='count', aes(label=..count..), vjust=0, hjust=0, size =3)+
ggtitle("Majors of Full-time Students in 2021 ")+
xlab("Major")+
ylab("Frequency")+
theme(legend.position = "none")
z13 + coord_flip()
calculate percentage change in full time student majors from 2020 to 2021
df_MCPS20D%>%
filter(full_part=="FT")%>%
group_by(term_year,major)%>%
count(major)%>%
group_by(term_year)%>%
group_by(major)%>%
arrange(term_year,.by_group=TRUE)%>%
mutate(pct_change= (n-lag(n))/lag(n)*100)
## # A tibble: 62 x 4
## # Groups: major [33]
## term_year major n pct_change
## <chr> <chr> <int> <dbl>
## 1 2020 0 3 NA
## 2 2021 0 2 -33.3
## 3 2020 American Sign Language 5 NA
## 4 2021 American Sign Language 1 -80
## 5 2020 Applied Geography 1 NA
## 6 2021 Applied Geography 1 0
## 7 2020 Architectural Technology 12 NA
## 8 2021 Architectural Technology 16 33.3
## 9 2020 Art 25 NA
## 10 2021 Art 22 -12
## # … with 52 more rows
Count of Majors in Part time students in 2020
z11<- df_MCPS20D%>%
filter(full_part=="PT" &term_year =="2020")%>%
ggplot(., aes(x=major, fill=major)) +
geom_bar() +
geom_text(stat='count', aes(label=..count..), vjust=0, hjust=0, size =3)+
ggtitle("Majors of Part-time Students in 2020 ")+
xlab("Major")+
ylab("Frequency")+
theme(legend.position = "none")
z11 + coord_flip()
Count of Majors in Part time students in 2021
z12<- df_MCPS20D%>%
filter(full_part=="PT" &term_year =="2021")%>%
ggplot(., aes(x=major, fill=major)) +
geom_bar() +
geom_text(stat='count', aes(label=..count..), vjust=0, hjust=0, size =3)+
ggtitle("Majors of Part-time Students in 2021 ")+
xlab("Major")+
ylab("Frequency")+
theme(legend.position = "none")
z12 + coord_flip()
calculate percentage change in part time student majors from 2020 to 2021
df_MCPS20D%>%
filter(full_part=="PT")%>%
group_by(term_year,major)%>%
count(major)%>%
group_by(term_year)%>%
group_by(major)%>%
arrange(term_year,.by_group=TRUE)%>%
mutate(pct_change= (n-lag(n))/lag(n)*100)
## # A tibble: 60 x 4
## # Groups: major [31]
## term_year major n pct_change
## <chr> <chr> <int> <dbl>
## 1 2020 0 5 NA
## 2 2020 American Sign Language 1 NA
## 3 2021 American Sign Language 2 100
## 4 2020 Applied Geography 2 NA
## 5 2021 Applied Geography 1 -50
## 6 2020 Architectural Technology 16 NA
## 7 2021 Architectural Technology 7 -56.2
## 8 2020 Art 11 NA
## 9 2021 Art 14 27.3
## 10 2020 Broadcast Media 5 NA
## # … with 50 more rows
Breakdown of Highschools Full time students in term year 2020 attended in MCPS
df_MCPS20D%>%
filter(full_part=="FT" & term_year=="2020")%>%
group_by(term_year,high_school)%>%
count(high_school)%>%
group_by(term_year)%>%
mutate(total_pop =sum(n))%>%
group_by(high_school)%>%
arrange(term_year,.by_group=TRUE)%>%
mutate(pct_pop= (n/total_pop*100))%>%
arrange(desc(pct_pop))
## # A tibble: 25 x 5
## # Groups: high_school [25]
## term_year high_school n total_pop pct_pop
## <chr> <chr> <int> <int> <dbl>
## 1 2020 Gaithersburg High School 114 1495 7.63
## 2 2020 Montgomery Blair High School 100 1495 6.69
## 3 2020 Northwest HS - Germantown 89 1495 5.95
## 4 2020 Paint Branch High School 85 1495 5.69
## 5 2020 Springbrook Sr High School 83 1495 5.55
## 6 2020 Colonel Zadok Magruder HS 72 1495 4.82
## 7 2020 Wheaton High School 72 1495 4.82
## 8 2020 Clarksburg High School 69 1495 4.62
## 9 2020 James Hubert Blake High School 69 1495 4.62
## 10 2020 Watkins Mill High School 69 1495 4.62
## # … with 15 more rows
Breakdown of Highschools Full time students in term year 2021 attended in MCPS
df_MCPS20D%>%
filter(full_part=="FT" & term_year=="2021")%>%
group_by(term_year,high_school)%>%
count(high_school)%>%
group_by(term_year)%>%
mutate(total_pop =sum(n))%>%
group_by(high_school)%>%
arrange(term_year,.by_group=TRUE)%>%
mutate(pct_pop= (n/total_pop*100))%>%
arrange(desc(pct_pop))
## # A tibble: 25 x 5
## # Groups: high_school [25]
## term_year high_school n total_pop pct_pop
## <chr> <chr> <int> <int> <dbl>
## 1 2021 Montgomery Blair High School 93 1497 6.21
## 2 2021 Wheaton High School 91 1497 6.08
## 3 2021 Paint Branch High School 84 1497 5.61
## 4 2021 Gaithersburg High School 83 1497 5.54
## 5 2021 Colonel Zadok Magruder HS 80 1497 5.34
## 6 2021 Northwest HS - Germantown 79 1497 5.28
## 7 2021 Richard Montgomery High School 75 1497 5.01
## 8 2021 Watkins Mill High School 74 1497 4.94
## 9 2021 Clarksburg High School 70 1497 4.68
## 10 2021 Sherwood High School 68 1497 4.54
## # … with 15 more rows
# calculate percentage change in full time student enrollment from 2020 to 2021 by MCPS highschool
df_MCPS20D%>%
filter(full_part=="FT")%>%
group_by(term_year,high_school)%>%
count(high_school)%>%
group_by(term_year)%>%
group_by(high_school)%>%
arrange(term_year,.by_group=TRUE)%>%
mutate(pct_change= (n-lag(n))/lag(n)*100)%>%
arrange(desc(pct_change))
## # A tibble: 50 x 4
## # Groups: high_school [25]
## term_year high_school n pct_change
## <chr> <chr> <int> <dbl>
## 1 2021 Walt Whitman High School 21 50
## 2 2021 Rockville High School 66 46.7
## 3 2021 Bethesda Chevy Chase High Schl 42 35.5
## 4 2021 Sherwood High School 68 33.3
## 5 2021 Seneca Valley High School 55 27.9
## 6 2021 Wheaton High School 91 26.4
## 7 2021 Thomas Sprigg Wootton High Sch 34 25.9
## 8 2021 Richard Montgomery High School 75 17.2
## 9 2021 Colonel Zadok Magruder HS 80 11.1
## 10 2021 Watkins Mill High School 74 7.25
## # … with 40 more rows
v1<- df_MCPS20D %>%
group_by(term_year,full_part) %>%
filter(full_part=="FT" & term_year=="2020")%>%
count(high_school) %>%
mutate(prop = n/sum(n)) %>%
ggplot(aes(x = high_school, y = prop)) +
geom_col(aes(fill=high_school), position = "dodge") +
geom_text(aes(label = scales::percent(prop,0.5),
y = prop,
group = high_school),
position = position_dodge(width = 0.9),
vjust = 0, size=3, hjust=0)+
# facet_wrap(~term_year )+
ggtitle("High schools full time students graduated in term year 2020 graduated")+
ylab('Proportion ')+
xlab("")+
theme(legend.position = "none", axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())
v1+ coord_flip()
v1<- df_MCPS20D %>%
group_by(term_year,full_part) %>%
filter(full_part=="FT" & term_year=="2021")%>%
count(high_school) %>%
mutate(prop = n/sum(n)) %>%
ggplot(aes(x = high_school, y = prop)) +
geom_col(aes(fill=high_school), position = "dodge") +
geom_text(aes(label = scales::percent(prop,0.5),
y = prop,
group = high_school),
position = position_dodge(width = 0.9),
vjust = 0, size=3, hjust=0)+
# facet_wrap(~term_year )+
ggtitle("High schools full time students graduated in term year 2021 graduated")+
ylab('Proportion ')+
xlab("")+
theme(legend.position = "none", axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())
v1+ coord_flip()
Breakdown of Highschools Part time students in term year 2020 attended in MCPS
df_MCPS20D%>%
filter(full_part=="PT" & term_year=="2020")%>%
group_by(term_year,high_school)%>%
count(high_school)%>%
group_by(term_year)%>%
mutate(total_pop =sum(n))%>%
group_by(high_school)%>%
arrange(term_year,.by_group=TRUE)%>%
mutate(pct_pop= (n/total_pop*100))%>%
arrange(desc(pct_pop))
## # A tibble: 25 x 5
## # Groups: high_school [25]
## term_year high_school n total_pop pct_pop
## <chr> <chr> <int> <int> <dbl>
## 1 2020 Gaithersburg High School 73 961 7.60
## 2 2020 John F. Kennedy High School 66 961 6.87
## 3 2020 Northwest HS - Germantown 64 961 6.66
## 4 2020 Montgomery Blair High School 59 961 6.14
## 5 2020 Albert Einstein HS & MC Art Cn 50 961 5.20
## 6 2020 Clarksburg High School 50 961 5.20
## 7 2020 Richard Montgomery High School 50 961 5.20
## 8 2020 Paint Branch High School 48 961 4.99
## 9 2020 Wheaton High School 43 961 4.47
## 10 2020 Quince Orchard Sr High School 42 961 4.37
## # … with 15 more rows
Breakdown of Highschools Part time students in term year 2021 attended in MCPS
df_MCPS20D%>%
filter(full_part=="PT" & term_year=="2021")%>%
group_by(term_year,high_school)%>%
count(high_school)%>%
group_by(term_year)%>%
mutate(total_pop =sum(n))%>%
group_by(high_school)%>%
arrange(term_year,.by_group=TRUE)%>%
mutate(pct_pop= (n/total_pop*100))%>%
arrange(desc(pct_pop))
## # A tibble: 25 x 5
## # Groups: high_school [25]
## term_year high_school n total_pop pct_pop
## <chr> <chr> <int> <int> <dbl>
## 1 2021 Northwest HS - Germantown 55 806 6.82
## 2 2021 Gaithersburg High School 54 806 6.70
## 3 2021 Northwood High School 45 806 5.58
## 4 2021 Paint Branch High School 44 806 5.46
## 5 2021 Montgomery Blair High School 43 806 5.33
## 6 2021 Colonel Zadok Magruder HS 42 806 5.21
## 7 2021 Quince Orchard Sr High School 39 806 4.84
## 8 2021 Albert Einstein HS & MC Art Cn 38 806 4.71
## 9 2021 Richard Montgomery High School 36 806 4.47
## 10 2021 Clarksburg High School 35 806 4.34
## # … with 15 more rows
# calculate percentage change in full time student enrollment from 2020 to 2021 by MCPS highschool
df_MCPS20D%>%
filter(full_part=="PT")%>%
group_by(term_year,high_school)%>%
count(high_school)%>%
group_by(term_year)%>%
group_by(high_school)%>%
arrange(term_year,.by_group=TRUE)%>%
mutate(pct_change= (n-lag(n))/lag(n)*100)%>%
arrange(desc(pct_change))
## # A tibble: 50 x 4
## # Groups: high_school [25]
## term_year high_school n pct_change
## <chr> <chr> <int> <dbl>
## 1 2021 Northwood High School 45 45.2
## 2 2021 Poolesville Jr-Sr High School 14 40
## 3 2021 Walter Johnson High School 35 40
## 4 2021 Thomas Sprigg Wootton High Sch 23 27.8
## 5 2021 Colonel Zadok Magruder HS 42 16.7
## 6 2021 Winston Churchill High School 14 16.7
## 7 2021 James Hubert Blake High School 33 3.12
## 8 2021 Quince Orchard Sr High School 39 -7.14
## 9 2021 Paint Branch High School 44 -8.33
## 10 2021 Damascus High School 20 -9.09
## # … with 40 more rows
v3<- df_MCPS20D %>%
group_by(term_year,full_part) %>%
filter(full_part=="PT" & term_year=="2020")%>%
count(high_school) %>%
mutate(prop = n/sum(n)) %>%
ggplot(aes(x = high_school, y = prop)) +
geom_col(aes(fill=high_school), position = "dodge") +
geom_text(aes(label = scales::percent(prop,0.5),
y = prop,
group = high_school),
position = position_dodge(width = 0.9),
vjust = 0, size=3, hjust=0)+
# facet_wrap(~term_year )+
ggtitle("High schools Part time students graduated in term year 2020 graduated")+
ylab('Proportion ')+
xlab("")+
theme(legend.position = "none", axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())
v3+ coord_flip()
v4<- df_MCPS20D %>%
group_by(term_year,full_part) %>%
filter(full_part=="PT" & term_year=="2021")%>%
count(high_school) %>%
mutate(prop = n/sum(n)) %>%
ggplot(aes(x = high_school, y = prop)) +
geom_col(aes(fill=high_school), position = "dodge") +
geom_text(aes(label = scales::percent(prop,0.5),
y = prop,
group = high_school),
position = position_dodge(width = 0.9),
vjust = 0, size=3, hjust=0)+
# facet_wrap(~term_year )+
ggtitle("High schools Part time students graduated in term year 2021 graduated")+
ylab('Proportion ')+
xlab("")+
theme(legend.position = "none", axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())
v4 + coord_flip()
For the purposes of this analysis I will run the analysis first with outliers and then after removing outliers.
Boxplots of hours_attempted by year by MCPS students 20yrs and younger
p11 = ggplot(df_MCPS20D, aes(hours_attempted))
p11 + geom_boxplot(aes(colour = term_year)) +
facet_wrap(~full_part)
Students who register for more than 18 credits require special permission from the department. Further more a full time student is classified as someone who is enrolled in 12 or more credits. A part time student is classified as someone who is enrolled in less than 12 credits. However based on thge dataset, a number of full time students attempt less than 12 credits and large a number of part time students attempt more than 12 hours.
Boxplots of hours_attempted by year by Full time MCPS students 20yrs and younger
df_MCPS20D%>%filter(full_part=="FT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(hours_attempted))+
geom_boxplot(aes(colour = term_year)) +
facet_wrap(~race)
Boxplots of hours_attempted by year by Part time MCPS students 20yrs and younger
df_MCPS20D%>%filter(full_part=="PT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(hours_attempted))+
geom_boxplot(aes(colour = term_year)) +
facet_wrap(~race)
There are not many outliers in the part time student groups. Term year 2021 seems to have more outliers on the upper end.
Density plot of hours_attempted by year
ggplot(df_MCPS20D, aes(hours_attempted, fill = term_year)) + geom_density(alpha = 0.2) +
facet_wrap(~full_part)+
xlab("Hours attempted") +
ylab( "Density")+
ggtitle(" Hours Attempted by Full-time Students vs Part-time Students")
Hours attempted by full time students
df_MCPS20D%>%filter(full_part=="FT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(hours_attempted, fill = term_year)) + geom_density(alpha = 0.2) +
facet_wrap(~race)+
xlab("Hours attempted") +
ylab( "Density") +
ggtitle(" Hours Attempted by Full-time Students")
Fivenum Summary of Full time students
df_MCPS20D%>% filter(full_part=="FT")%>%
group_by(race,term_year)%>%
summarise(n = n(),
min = fivenum(hours_attempted)[1],
Q1 = fivenum(hours_attempted)[2],
median = fivenum(hours_attempted)[3],
Q3 = fivenum(hours_attempted)[4],
max = fivenum(hours_attempted)[5],
mean= mean(hours_attempted),
sd = sd(hours_attempted))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups: race [9]
## race term_year n min Q1 median Q3 max mean sd
## <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Am. Indian / AK N… 2020 4 13 15 18 27.5 36 21.2 10.1
## 2 Am. Indian / AK N… 2021 1 13 13 13 13 13 13 NA
## 3 Asian 2020 252 12 13 15 21 52 18.5 8.13
## 4 Asian 2021 217 12 13 15 18 46 17.3 6.89
## 5 Black / African A… 2020 341 12 12 13 15 42 14.2 3.38
## 6 Black / African A… 2021 307 12 13 14 16 38 15.4 4.05
## 7 Foreign 2020 98 12 13 14 18 31 15.6 3.98
## 8 Foreign 2021 98 12 13 15 16 37 16.1 5.26
## 9 Hawaiian / Pac. I… 2020 4 12 12.5 13 14 15 13.2 1.26
## 10 Hawaiian / Pac. I… 2021 3 12 15.5 19 24.5 30 20.3 9.07
## 11 Hispanic 2020 482 12 12 13 16 39 15.0 4.21
## 12 Hispanic 2021 569 12 13 14 16 43 15.6 4.36
## 13 Multi-Race 2020 66 12 12 14 18 44 17.3 8.02
## 14 Multi-Race 2021 60 12 12.5 14 17 43 16.2 6.27
## 15 Unknown 2020 10 12 12 14 15 31 15.6 5.72
## 16 Unknown 2021 3 12 12 12 13 14 12.7 1.15
## 17 White 2020 238 12 12 14 18 46 16.9 7.17
## 18 White 2021 239 12 13 15 18 54 17.0 6.50
Hours attempted by part time students
df_MCPS20D%>%filter(full_part=="PT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(hours_attempted, fill = term_year)) + geom_density(alpha = 0.2) +
facet_wrap(~race)+
xlab("Hours attempted") +
ylab( "Density")+
ggtitle(" Hours Attempted by Part-time Students")
Fivenum Summary of Part time students
df_MCPS20D%>% filter(full_part=="PT")%>%
group_by(race,term_year)%>%
summarise(n = n(),
min = fivenum(hours_attempted)[1],
Q1 = fivenum(hours_attempted)[2],
median = fivenum(hours_attempted)[3],
Q3 = fivenum(hours_attempted)[4],
max = fivenum(hours_attempted)[5],
mean= mean(hours_attempted),
sd = sd(hours_attempted))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups: race [9]
## race term_year n min Q1 median Q3 max mean sd
## <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Am. Indian / AK N… 2020 5 3 3 6 8 9 5.8 2.77
## 2 Am. Indian / AK N… 2021 1 6 6 6 6 6 6 NA
## 3 Asian 2020 89 2 7 9 10 11 8.26 2.57
## 4 Asian 2021 73 3 8 9 10 11 8.45 2.47
## 5 Black / African A… 2020 225 1 6 9 10 11 7.78 2.56
## 6 Black / African A… 2021 200 1 6 8 10 11 7.61 2.69
## 7 Foreign 2020 78 3 6 8 10 11 7.58 2.46
## 8 Foreign 2021 52 3 5 9 10 11 7.71 2.71
## 9 Hawaiian / Pac. I… 2020 2 6 6 7.5 9 9 7.5 2.12
## 10 Hawaiian / Pac. I… 2021 1 5 5 5 5 5 5 NA
## 11 Hispanic 2020 379 1 6 8 10 11 7.75 2.41
## 12 Hispanic 2021 290 1 6 9 10 11 8.11 2.52
## 13 Multi-Race 2020 38 1 5 9 9 11 7.26 2.87
## 14 Multi-Race 2021 38 3 6 9 10 11 8 2.45
## 15 Unknown 2020 6 7 9 9.5 10 10 9.17 1.17
## 16 Unknown 2021 2 4 4 6.5 9 9 6.5 3.54
## 17 White 2020 139 1 6 9 10 11 7.94 2.65
## 18 White 2021 149 3 5 8 10 11 7.60 2.71
Boxplots of Hours Earned by year by MCPS students 20yrs and younger
p11 = ggplot(df_MCPS20D, aes(hours_earned))
p11 + geom_boxplot(aes(colour = term_year)) +
facet_wrap(~full_part)
Boxplots of hours_earned by year by Full time MCPS students 20yrs and younger
df_MCPS20D%>%filter(full_part=="FT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(hours_earned))+
geom_boxplot(aes(colour = term_year)) +
facet_wrap(~race)
Boxplots of hours_earned by year by Part time MCPS students 20yrs and younger
df_MCPS20D%>%filter(full_part=="PT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(hours_earned))+
geom_boxplot(aes(colour = term_year)) +
facet_wrap(~race)
There are not many outliers in the part time student groups. Term year 2021 seems to have more outliers on the upper end.
Density plot of hours_earned by year
ggplot(df_MCPS20D, aes(hours_earned, fill = term_year)) + geom_density(alpha = 0.2) +
facet_wrap(~full_part)+
xlab("Hours Earned") +
ylab( "Density")+
ggtitle(" Hours Earned by Full-time vs Part-time Students")
Hours_earned by full time students
df_MCPS20D%>%filter(full_part=="FT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(hours_earned, fill = term_year)) + geom_density(alpha = 0.2) +
facet_wrap(~race)+
xlab("Hours Earned") +
ylab( "Density")+
ggtitle(" Hours Earned by Full-time Students")
Fivenum Summary of Full time students
df_MCPS20D%>% filter(full_part=="FT")%>%
group_by(race,term_year)%>%
summarise(n = n(),
min = fivenum(hours_earned)[1],
Q1 = fivenum(hours_earned)[2],
median = fivenum(hours_earned)[3],
Q3 = fivenum(hours_earned)[4],
max = fivenum(hours_earned)[5],
mean= mean(hours_earned),
sd = sd(hours_earned))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups: race [9]
## race term_year n min Q1 median Q3 max mean sd
## <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Am. Indian / AK N… 2020 4 10 12 16.5 27.5 36 19.8 11.4
## 2 Am. Indian / AK N… 2021 1 13 13 13 13 13 13 NA
## 3 Asian 2020 252 0 10 13 18 52 15.6 9.18
## 4 Asian 2021 217 0 9 13 16 46 14.0 8.47
## 5 Black / African A… 2020 341 0 6 9 12 42 9.44 5.57
## 6 Black / African A… 2021 307 0 6 9 13 37 10.0 6.57
## 7 Foreign 2020 98 0 6 11 14 31 10.9 6.71
## 8 Foreign 2021 98 0 6 10 13 37 10.7 7.87
## 9 Hawaiian / Pac. I… 2020 4 0 4.5 10.5 12.5 13 8.5 5.92
## 10 Hawaiian / Pac. I… 2021 3 9 12.5 16 23 30 18.3 10.7
## 11 Hispanic 2020 482 0 6 10 13 38 10.3 6.57
## 12 Hispanic 2021 569 0 6 11 13 42 10.5 6.49
## 13 Multi-Race 2020 66 0 7 12 15 44 13.9 9.82
## 14 Multi-Race 2021 60 0 6 11 15 43 11.6 8.79
## 15 Unknown 2020 10 3 4 9.5 14 31 11 8.18
## 16 Unknown 2021 3 3 5 7 9.5 12 7.33 4.51
## 17 White 2020 238 0 7 12 16 46 13.3 9.09
## 18 White 2021 239 0 9 12 16 54 13.0 8.29
hours_earned by part time students
df_MCPS20D%>%filter(full_part=="PT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(hours_earned, fill = term_year)) + geom_density(alpha = 0.2) +
facet_wrap(~race)+
xlab("Hours Earned") +
ylab( "Density")+
ggtitle(" Hours Earned by Part-time Students")
Fivenum Summary of Part time students
df_MCPS20D%>% filter(full_part=="PT")%>%
group_by(race,term_year)%>%
summarise(n = n(),
min = fivenum(hours_earned)[1],
Q1 = fivenum(hours_earned)[2],
median = fivenum(hours_earned)[3],
Q3 = fivenum(hours_earned)[4],
max = fivenum(hours_earned)[5],
mean= mean(hours_earned),
sd = sd(hours_earned))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups: race [9]
## race term_year n min Q1 median Q3 max mean sd
## <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Am. Indian / AK … 2020 5 0 0 3 3 3 1.8 1.64
## 2 Am. Indian / AK … 2021 1 3 3 3 3 3 3 NA
## 3 Asian 2020 89 0 1 5 7 11 4.64 3.59
## 4 Asian 2021 73 0 3 3 6 11 4.27 3.29
## 5 Black / African … 2020 225 0 0 3 6 11 3.14 3.07
## 6 Black / African … 2021 200 0 0 3 6 11 2.74 3.22
## 7 Foreign 2020 78 0 0 3 6 11 3.71 3.38
## 8 Foreign 2021 52 0 0 0 6 11 2.69 3.32
## 9 Hawaiian / Pac. … 2020 2 0 0 0 0 0 0 0
## 10 Hawaiian / Pac. … 2021 1 3 3 3 3 3 3 NA
## 11 Hispanic 2020 379 0 0 3 6 11 3.37 3.21
## 12 Hispanic 2021 290 0 0 3 6 11 3.69 3.19
## 13 Multi-Race 2020 38 0 1 3 9 11 4.45 3.85
## 14 Multi-Race 2021 38 0 0 3 6 10 3.34 3.16
## 15 Unknown 2020 6 0 1 2.5 6 9 3.5 3.51
## 16 Unknown 2021 2 3 3 3.5 4 4 3.5 0.707
## 17 White 2020 139 0 0 5 7 11 4.58 3.56
## 18 White 2021 149 0 2 4 6 11 4.34 3.33
Boxplots of GPA by year by MCPS students 20yrs and younger
p11 = ggplot(df_MCPS20D, aes(mc_gpa))
p11 + geom_boxplot(aes(colour = term_year)) +
facet_wrap(~full_part)
Boxplots of GPA by year by Full time MCPS students 20yrs and younger
df_MCPS20D%>%filter(full_part=="FT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(mc_gpa))+
geom_boxplot(aes(colour = term_year)) +
facet_wrap(~race)
Boxplots of GPA by year by Part time MCPS students 20yrs and younger
df_MCPS20D%>%filter(full_part=="PT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(mc_gpa))+
geom_boxplot(aes(colour = term_year)) +
facet_wrap(~race)
Density plot of GPA by year
ggplot(df_MCPS20D, aes(mc_gpa, fill = term_year)) + geom_density(alpha = 0.2) +
facet_wrap(~full_part)+
xlab("GPA") +
ylab( "Density")+
ggtitle(" GPA by Full-time vs Part-time Students")
GPA by full time students
df_MCPS20D%>%filter(full_part=="FT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(mc_gpa, fill = term_year)) + geom_density(alpha = 0.2) +
facet_wrap(~race)+
xlab("GPA") +
ylab( "Density")+
ggtitle(" GPA of Full-time Students")
Fivenum Summary of Full time students
df_MCPS20D%>% filter(full_part=="FT")%>%
group_by(race,term_year)%>%
summarise(n = n(),
min = fivenum(mc_gpa)[1],
Q1 = fivenum(mc_gpa)[2],
median = fivenum(mc_gpa)[3],
Q3 = fivenum(mc_gpa)[4],
max = fivenum(mc_gpa)[5],
mean= mean(mc_gpa),
sd = sd(mc_gpa))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups: race [9]
## race term_year n min Q1 median Q3 max mean sd
## <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Am. Indian / AK … 2020 4 2.35 2.62 3.2 3.75 4 3.19 0.717
## 2 Am. Indian / AK … 2021 1 2.77 2.77 2.77 2.77 2.77 2.77 NA
## 3 Asian 2020 252 0 2.5 3.31 3.75 4 2.98 1.01
## 4 Asian 2021 217 0 2.5 3.23 3.71 4 2.90 1.09
## 5 Black / African … 2020 341 0 1.62 2.5 3.18 4 2.34 1.13
## 6 Black / African … 2021 307 0 1.33 2.67 3.43 4 2.33 1.29
## 7 Foreign 2020 98 0 2 3 3.69 4 2.66 1.26
## 8 Foreign 2021 98 0 1.19 2.82 3.69 4 2.42 1.40
## 9 Hawaiian / Pac. … 2020 4 0 1.12 2.46 3.22 3.77 2.17 1.58
## 10 Hawaiian / Pac. … 2021 3 1.75 2.22 2.68 3.34 4 2.81 1.13
## 11 Hispanic 2020 482 0 1.5 2.8 3.46 4 2.45 1.24
## 12 Hispanic 2021 569 0 1.5 2.69 3.4 4 2.36 1.27
## 13 Multi-Race 2020 66 0 2 2.75 3.5 4 2.64 1.09
## 14 Multi-Race 2021 60 0 1.5 2.79 3.58 4 2.47 1.31
## 15 Unknown 2020 10 0.33 2 2.46 3.4 4 2.57 1.05
## 16 Unknown 2021 3 2.55 2.65 2.75 3.38 4 3.1 0.786
## 17 White 2020 238 0 1.8 3 3.63 4 2.62 1.23
## 18 White 2021 239 0 2 3 3.70 4 2.68 1.25
GPA of Part time students
df_MCPS20D%>%filter(full_part=="PT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(mc_gpa, fill = term_year)) + geom_density(alpha = 0.2) +
facet_wrap(~race)+
xlab("Hours Earned") +
ylab( "Density")+
ggtitle(" GPA of Part-time Students")
Fivenum Summary of Part time students
df_MCPS20D%>% filter(full_part=="PT")%>%
group_by(race,term_year)%>%
summarise(n = n(),
min = fivenum(mc_gpa)[1],
Q1 = fivenum(mc_gpa)[2],
median = fivenum(mc_gpa)[3],
Q3 = fivenum(mc_gpa)[4],
max = fivenum(mc_gpa)[5],
mean= mean(mc_gpa),
sd = sd(mc_gpa))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups: race [9]
## race term_year n min Q1 median Q3 max mean sd
## <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Am. Indian / AK … 2020 5 0 0 1 1.5 3 1.1 1.24
## 2 Am. Indian / AK … 2021 1 2 2 2 2 2 2 NA
## 3 Asian 2020 89 0 0.67 2.33 3.3 4 2.08 1.45
## 4 Asian 2021 73 0 0.67 2 3.5 4 2.01 1.51
## 5 Black / African … 2020 225 0 0 1.33 2.75 4 1.50 1.38
## 6 Black / African … 2021 200 0 0 0.635 2.46 4 1.21 1.35
## 7 Foreign 2020 78 0 0 2 3 4 1.78 1.49
## 8 Foreign 2021 52 0 0 0 2.67 4 1.26 1.46
## 9 Hawaiian / Pac. … 2020 2 0 0 0 0 0 0 0
## 10 Hawaiian / Pac. … 2021 1 4 4 4 4 4 4 NA
## 11 Hispanic 2020 379 0 0 1.5 3 4 1.63 1.45
## 12 Hispanic 2021 290 0 0 1.58 3 4 1.64 1.43
## 13 Multi-Race 2020 38 0 0.67 2 3.5 4 1.98 1.50
## 14 Multi-Race 2021 38 0 0 2 3 4 1.69 1.53
## 15 Unknown 2020 6 0 0.75 2.16 3.67 4 2.12 1.57
## 16 Unknown 2021 2 3 3 3.5 4 4 3.5 0.707
## 17 White 2020 139 0 0 2.33 3.26 4 1.96 1.48
## 18 White 2021 149 0 0.33 2.57 3.33 4 2.14 1.49
## Hours Earned Rate
Density plot of Hours Earned Rate by year
ggplot(df_MCPS20D, aes(hours_earned_rate, fill = term_year)) + geom_density(alpha = 0.3) +
facet_wrap(~full_part)+
xlab("Hours Earned Rate") +
ylab( "Density")+
xlim(0,1)
Boxplots of Hours Earned Rate of Full time MCPS students 20yrs and younger
df_MCPS20D%>%filter(full_part=="FT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(hours_earned_rate))+
geom_boxplot(aes(colour = term_year)) +
facet_wrap(~race)
Boxplots of Hours Earned Rate of Part time MCPS students 20yrs and younger
df_MCPS20D%>%filter(full_part=="PT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(hours_earned_rate))+
geom_boxplot(aes(colour = term_year)) +
facet_wrap(~race)
Hours Earned Rate of full time students
df_MCPS20D%>%filter(full_part=="FT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(hours_earned_rate, fill = term_year)) + geom_density(alpha = 0.2) +
facet_wrap(~race)+
xlab("GPA") +
ylab( "Density")+
ggtitle(" Hours Earned Rate of Full-time Students")
Hours Earned Rate of part time students
df_MCPS20D%>%filter(full_part=="PT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(hours_earned_rate, fill = term_year)) + geom_density(alpha = 0.2) +
facet_wrap(~race)+
xlab("GPA") +
ylab( "Density")+
ggtitle(" Hours Earned Rate of Part-time Students")
library(GGally)
# plot distributions and correlation of variables
df_MCPS20D%>% filter(term_year=="2020")%>%
filter(full_part=="FT")%>%
ggpairs(., columns = c("hours_attempted","hours_earned", "mc_gpa","hours_earned_rate"))
library(GGally)
# plot distributions and correlation of variables
df_MCPS20D%>% filter(term_year=="2021")%>%
filter(full_part=="FT")%>%
ggpairs(., columns = c("hours_attempted","hours_earned", "mc_gpa","hours_earned_rate"))
library(GGally)
# plot distributions and correlation of variables
df_MCPS20D%>% filter(term_year=="2020")%>%
filter(full_part=="PT")%>%
ggpairs(., columns = c("hours_attempted","hours_earned", "mc_gpa","hours_earned_rate"))
library(GGally)
# plot distributions and correlation of variables
df_MCPS20D%>% filter(term_year=="2021")%>%
filter(full_part=="PT")%>%
ggpairs(., columns = c("hours_attempted","hours_earned", "mc_gpa","hours_earned_rate"))